25 research outputs found

    Algorithmic advancements in Computational Historical Linguistics

    Get PDF
    Computergestützte Methoden in der historischen Linguistik haben in den letzten Jahren einen großen Aufschwung erlebt. Die wachsende Verfügbarkeit maschinenlesbarer Daten förderten diese Entwicklung ebenso wie die zunehmende Leistungsfähigkeit von Computern. Die in dieser Forschung verwendeten Berechnungsmethoden stammen aus verschiedenen wissenschaftlichen Disziplinen, wobei Methoden aus der Bioinformatik sicherlich die Initialzündung gaben. Diese Arbeit, die sich von Fortschritten in angrenzenden Gebieten inspirieren lässt, zielt darauf ab, die bestehenden Berechnungsmethoden in verschiedenen Bereichen der computergestützten historischen Linguistik zu verbessern. Mit Hilfe von Fortschritten aus der Forschung aus dem maschinellen Lernen und der Computerlinguistik wird hier eine neue Trainingsmethode für Algorithmen zur Kognatenerkennung vorgestellt. Diese Methode erreicht an vielen Stellen die besten Ergebnisse im Bereich der Kognatenerkennung. Außerdem kann das neue Trainingsschema die Rechenzeit erheblich verbessern. Ausgehend von diesen Ergebnissen wird eine neue Kombination von Methoden der Bioinformatik und der historischen Linguistik entwickelt. Durch die Definition eines expliziten Modells der Lautevolution wird der Begriff der evolutionären Zeit in die Kognatenerkennung mit einbezogen. Die sich daraus ergebenden posterioren Verteilungen werden verwendet, um das Modell anhand einer standardmäßigen Kognatenerkennung zu evaluieren. Eine weitere klassische Problemstellung in der pyhlogenetischen Forschung ist die Inferenz eines Baumes. Aktuelle Methoden, die den ``quasi-industriestandard'' bilden, verwenden den klassischen Metropolis-Hastings-Algorithmus. Allerdings ist bekannt, dass dieser Algorithmus für hochdimensionale und korrelierte Daten vergleichsweise ineffizient ist. Um dieses Problem zu beheben, wird im letzten Kapitel ein Algorithmus vorgestellt, der die Hamilton'sche Dynamik verwendet.The use of computational methods in historical linguistics has seen a large boost in recent years. An increasing availability of machine readable data and the growing power of computers fostered this development. While the computational methods which are used in this research stem from different scientific disciplines, a lot of tools from computational biology have found their way into this research. Drawing inspiration from advancements in related fields, this thesis aims at improving existing computational methods in different disciplines of computational historical linguistics. Using advancements from machine learning and natural language processing research, I present an updated training regime for cognate detection algorithms. Besides achieving state of the art performance in a cognate clustering task, the updated training scheme considerably improved computation time. Following up on these results, I develop a novel combination of tools from bioinformatics and historical linguistics is developed. By defining an explicit model of sound evolution, I include the notion of evolutionary time into a cognate detection task. The resulting posterior distributions are used to evaluate the model on a standard cognate detection task. A standard problem in phylogenetic research is the inference of a tree. Current quasi "industry-standard" methods use the classical Metropolis-Hastings algorithm. However, this algorithm is known to be rather inefficient for high dimensional and correlated data. To solve this problem, I present an algorithm which uses Hamiltonian dynamics in the last chapter

    An approach to cross-concept cognacy identification

    Get PDF
    It is a well known phenomenon in historical linguistics, that the meaning of a proto form is different to the meaning of its descendants. This phenomenon of meaning change is often ignored in studies which use tools from statistical phylogenetic analysis to determine language relationships. It has been shown, that the databases currently used in linguistic phylogeny exhibit a considerable amount of the described phenomenon. The current study proposes a method to detect such instances of cross-concept relationships of words. Although the evaluation can not be done by standard means, the results indicate that semantic similarity is a good indicator for cross-concept relationships and that tools from computational biology offer a good framework for this kind of approach

    Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?

    Get PDF
    We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets. Our findings suggest that phylogenies inferred from automated cog- nate sets come close to phylogenies inferred from expert-annotated ones, although on average, the latter are still superior. We con- clude that future work on phylogenetic reconstruction can profit much from automatic cognate detection. Especially where scholars are merely interested in exploring the bigger picture of a language family’s phylogeny, algorithms for automatic cognate detection are a useful complement for current research on language phylogenies

    Examining the generalizability of research findings from archival data

    Get PDF
    This initiative examined systematically the extent to which a large set of archival research findings generalizes across contexts. We repeated the key analyses for 29 original strategic management effects in the same context (direct reproduction) as well as in 52 novel time periods and geographies; 45% of the reproductions returned results matching the original reports together with 55% of tests in different spans of years and 40% of tests in novel geographies. Some original findings were associated with multiple new tests. Reproducibility was the best predictor of generalizability—for the findings that proved directly reproducible, 84% emerged in other available time periods and 57% emerged in other geographies. Overall, only limited empirical evidence emerged for context sensitivity. In a forecasting survey, independent scientists were able to anticipate which effects would find support in tests in new samples

    Examining the generalizability of research findings from archival data

    Get PDF
    This initiative examined systematically the extent to which a large set of archival research findings generalizes across contexts. We repeated the key analyses for 29 original strategic management effects in the same context (direct reproduction) as well as in 52 novel time periods and geographies; 45% of the reproductions returned results matching the original reports together with 55% of tests in different spans of years and 40% of tests in novel geographies. Some original findings were associated with multiple new tests. Reproducibility was the best predictor of generalizability-for the findings that proved directly reproducible, 84% emerged in other available time periods and 57% emerged in other geographies. Overall, only limited empirical evidence emerged for context sensitivity. In a forecasting survey, independent scientists were able to anticipate which effects would find support in tests in new samples

    Phylogenetic Typology

    Get PDF
    In this article we propose a novel method to estimate the frequency distribution of linguistic variables while controlling for statistical non-independence due to shared ancestry. Unlike previous approaches, our technique uses all available data, from language families large and small as well as from isolates, while controlling for different degrees of relatedness on a continuous scale estimated from the data. Our approach involves three steps: First, distributions of phylogenies are inferred from lexical data. Second, these phylogenies are used as part of a statistical model to estimate transition rates between parameter states. Finally, the long-term equilibrium of the resulting Markov process is computed. As a case study, we investigate a series of potential word-order correlations across the languages of the world

    Heterogeneity of the Axon Initial Segment in Interneurons and Pyramidal Cells of Rodent Visual Cortex

    No full text
    The microdomain that orchestrates action potential initiation in neurons is the axon initial segment (AIS). It has long been considered to be a rather homogeneous domain at the very proximal axon hillock with relatively stable length, particularly in cortical pyramidal cells. However, studies in other brain regions paint a different picture. In hippocampal CA1, up to 50% of axons emerge from basal dendrites. Further, in about 30% of thick-tufted layer V pyramidal neurons in rat somatosensory cortex, axons have a dendritic origin. Consequently, the AIS is separated from the soma. Recent in vitro and in vivo studies have shown that cellular excitability is a function of AIS length/position and somatodendritic morphology, undermining a potentially significant impact of AIS heterogeneity for neuronal function. We therefore investigated neocortical axon morphology and AIS composition, hypothesizing that the initial observation of seemingly homogeneous AIS is inadequate and needs to take into account neuronal cell types. Here, we biolistically transfected cortical neurons in organotypic cultures to visualize the entire neuron and classify cell types in combination with immunolabeling against AIS markers. Using confocal microscopy and morphometric analysis, we investigated axon origin, AIS position, length, diameter as well as distance to the soma. We find a substantial AIS heterogeneity in visual cortical neurons, classified into three groups: (I) axons with somatic origin with proximal AIS at the axon hillock; (II) axons with somatic origin with distal AIS, with a discernible gap between the AIS and the soma; and (III) axons with dendritic origin (axon-carrying dendrite cell, AcD cell) and an AIS either starting directly at the axon origin or more distal to that point. Pyramidal cells have significantly longer AIS than interneurons. Interneurons with vertical columnar axonal projections have significantly more distal AIS locations than all other cells with their prevailing phenotype as an AcD cell. In contrast, neurons with perisomatic terminations display most often an axon originating from the soma. Our data contribute to the emerging understanding that AIS morphology is highly variable, and potentially a function of the cell type
    corecore